On Residual Prediction in Voice Conversion Task
نویسنده
چکیده
Nowadays, voice conversion is a problem which is intensively analyzed by many researchers. A large group of existing voice conversion systems is based on RELP re-synthesis. Within these systems, the speech signal is pitchsynchronously segmented and described with LSF parameters. A transformation function is acquired by employing pairs of equal time-aligned utterances from source and target speaker. The conversion function for LSFs is often derived from probabilistic description of LSF pairs. The residual signal is also important for speech perception; it is transformed by so called residual prediction. Usually, a suitable residual signal is estimated from converted LSFs. We proposed two alternative approaches. First, we tried to predict the target speaker’s residual signal directly from source speaker’s LSFs. Then we combined both aforementioned approaches and used both source and converted LSFs for residual signal estimation. Moreover within each of these methods, the probabilistic and Euclidian-metrics-based descriptions of LSF parameter space were employed. Various pairs of source and target speaker were tested. Objective evaluation of converted speech using performance metrics and orientation listening tests was performed. The preliminary experiments revealed that no approach is generally better then others. However, in particular cases one method is usually preferable to others. Thus a universal voice conversion system should automatically decide which technique of residual prediction will be utilized.
منابع مشابه
F0 transformation within the voice conversion framework
In this paper, several experiments on F0 transformation within the voice conversion framework are presented. The conversion system is based on a probabilistic transformation of line spectral frequencies and residual prediction. Three probabilistic methods of instantaneous F0 transformation are described and compared. Moreover, a new modification of inter-speaker residual prediction is proposed ...
متن کاملVoice Conversion Based on Probabilistic Parameter Transformation and Extended Inter-speaker Residual Prediction
Voice conversion is a process which modifies speech produced by one speaker so that it sounds as if it is uttered by another speaker. In this paper a new voice conversion system is presented. The system requires parallel training data. By using linear prediction analysis, speech is described with line spectral frequencies and the corresponding residua. LSFs are converted together with instantan...
متن کاملVoice conversion using General Regression Neural Network
The objective of voice conversion system is to formulate the mapping function which can transform the source speaker characteristics to that of the target speaker. In this paper, we propose the General Regression Neural Network (GRNN) based model for voice conversion. It is a single pass learning network that makes the training procedure fast and comparatively less time consuming. The proposed ...
متن کاملThe linear transformation of LF glottal waveforms for voice conversion
Most Voice Conversion (VC) systems exploit source-filter decomposition based on linear prediction (LP) to transform spectral envelopes, incurring as a result various issues related to the oversimplification of the LP voice source model. Whilst residual prediction methods can mitigate this problem, they cannot be used to modify voice source quality. In this paper, a system which employs linear t...
متن کاملFirst Steps Towards New Czech Voice Conversion System
In this paper we deal with initial experiments on creating a new Czech voice conversion system. Voice conversion (VC) is a process which modifies the speech signal produced by one (source) speaker so that it sounds like another (target) speaker. Using VC technique a new voice for speech synthesizer can be prepared with no need to record a huge amount of new speech data. The transformation is de...
متن کامل